fix(crashtracker): move preload logger marking after recursive guard#2023
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
🎉 All green!🧪 All tests passed 🎯 Code Coverage (details) 🔗 Commit SHA: 08ea319 | Docs | Datadog PR Page | Give us feedback! |
📚 Documentation Check Results📦
|
Clippy Allow Annotation ReportComparing clippy allow annotations between branches:
Summary by Rule
Annotation Counts by File
Annotation Stats by Crate
About This ReportThis report tracks Clippy allow annotations for specific rules, showing how they've changed in this PR. Decreasing the number of these annotations generally improves code quality. |
🔒 Cargo Deny Results📦
|
118e192 to
40de8cc
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2023 +/- ##
==========================================
- Coverage 72.83% 72.80% -0.04%
==========================================
Files 458 458
Lines 75789 75789
==========================================
- Hits 55203 55175 -28
- Misses 20586 20614 +28
🚀 New features to boost your workflow:
|
40de8cc to
3a72305
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep them coming! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
The expected merge time in
The merge request has been interrupted because the build 65994034765863092 took longer than expected. The current limit for the base branch 'main' is 120 minutes. |
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|

What does this PR do?
We saw a crash as such, where the stacktrace shows failures within the crashtracker itself, looking like
handle_posix_sigaction->dlysm-> ... ->handle_posix_sigaction->dlysm-> ... -> ...This is because
mark_preload_logger_collector()inhandle_posix_sigactionis called before the one-time guardNUM_TIMES_CALLED. Ifdlsyminmark_preload_logger_collector()fails, thenhandle_posix_sigaction->handle_posix_signal_implmark_preload_logger_collector()callsdlsym-> SIGBUS again (same faulty mapping)handle_posix_sigaction->handle_posix_signal_implNote that the production failure might not have been
dd_preload_logger_mark_collectoritself, butdlsymfailure while iterating through dyn loaded libs. The important part is that dlsym can fail and this will retrigger crashtrackerMotivation
What inspired you to submit this pull request?
Additional Notes
AI generated the script to repro
How to test the change?
Reproduced by running a script that loaded in
dd_preload_logger_mark_collectorsymbol that failed after N times, which the previous times, triggers a SIGBUS. Basically, the gist is, we should not do anything after we enter the crash handler, before we check the one time guard